Training Set Properties and Decision-Tree Taggers: A Closer Look

نویسنده

  • Brian R. Hirshman
چکیده

This paper examines three ways to improve part-of-speech tagging accuracy: by increasing the number of training examples presented to the tree learner, by increasing the number of word-specific subtrees grown, and by increasing the number of ngrams (preceding parts of speech) per training example. Though experimental results indicate that additional training data generally leads to the greatest amount improved accuracy, they also demonstrate that including word-specific subtrees can be useful and that trees considering two or more previous parts of speech in their classification decision are superior to those examining just one.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A closer look at rock physics models and their assisted interpretation in seismic exploration

Subsurface rocks and their fluid content along with their architecture affect reflected seismic waves through variations in their travel time, reflection amplitude, and phase within the field of exploration seismology. The combined effects of these factors make subsurface interpretation by using reflection waves very difficult. Therefore, assistance from other subsurface disciplines is needed i...

متن کامل

مطالعات درخت تصمیم در برآورد ریسک ابتلا به سرطان سینه با استفاده از چند شکلی‌های تک نوکلوئیدی

Abstract Introduction:   Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important ...

متن کامل

Determining Factors Influencing Length of Stay and Predicting Length of Stay Using Data Mining in the General Surgery Department

Background: Length of stay is one of the most important indicators in assessing hospital performance. A shorter stay can reduce the costs per discharge and shift care from inpatient to less expensive post-acute settings. It can lead to a greater readmission rate, better resource management, and more efficient services. Objective: This study aimed to ident...

متن کامل

Steel Buildings Damage Classification by damage spectrum and Decision Tree Algorithm

Results of damage prediction in buildings can be used as a useful tool for managing and decreasing seismic risk of earthquakes. In this study, damage spectrum and C4.5 decision tree algorithm were utilized for damage prediction in steel buildings during earthquakes. In order to prepare the damage spectrum, steel buildings were modeled as a single-degree-of-freedom (SDOF) system and time-history...

متن کامل

A Comparison of Three Machine Learning Methods for Amazigh POS Tagging

Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper describes a set of experiments involving the application of three state-of the-art part-of-speech taggers to Amazigh texts, using a tagset of 28 tags. The taggers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006